Identifying Multidocument Relations
نویسندگان
چکیده
The digital world generates an incredible accumulation of information. This results in redundant, complementary, and contradictory information, which may be produced by several sources. Applications as multidocument summarization and question answering are committed to handling this information and require the identification of relations among the various texts in order to accomplish their tasks. In this paper we first describe an effort to create and annotate a corpus of news texts with multidocument relations from the Crossdocument Structure Theory (CST) and then present a machine learning experiment for the automatic identification of some of these relations. We show that our results for both tasks are satisfactory.
منابع مشابه
Integrating the UMLS into an RDF-Based Biomedical Knowledge Repository.
As part of Advanced Library Services project at the National Library of Medicine, we are creating a very large Biomedical Knowledge Repository (BKR), which serves as background knowledge for applications including knowledge discovery and multidocument summarization [1]. The BKR integrates relations extracted from the biomedical literature (e.g., Medline citations) and from structured knowledge ...
متن کاملTowards Multidocument Summarization by Reformulation: Progress and Prospects
By synthesizing information common to retrieved documents, multi-document summarization can help users of information retrieval systems to find relevant documents with a minimal amount of reading. We are developing a multidocument summarization system to automatically generate a concise summary by identifying and synthesizing similarities across a set of related documents. Our approach is uniqu...
متن کاملMulti-Document Discourse Parsing Using Traditional and Hierarchical Machine Learning
Multi-document handling is essential today, when many documents on the same topic are produced, especially considering the Web. Both readers and computer applications can benefit from a discourse analysis of this multidocument content, since it demonstrates clearly the relations among portions of these documents. This work aims to identify such relations automatically using machine learning tec...
متن کاملParaphrasing and Translation
Usefulness of paraphrases • Paraphrases are alternative ways of conveying the same information • Useful in NLP application such as: – Generation producing paraphrases allows for the creation of more varied and fluent text – Multidocument summarization identifying paraphrases allows information repeated across documents to be condensed – Question answering paraphrasing is important when going be...
متن کاملMachine and Human Performance for Single and Multidocument Summarization
coherency—and be able to draw the “best” information from a set of documents. Automatic single-document text summarization1 has been an active research area since the 1950s, with a renaissance of approaches since the 1990s. Human single-document summarization is well defined when guidelines and recommendations drive performance.2,3 System-generated single-document summaries, while not always ma...
متن کامل